Create dead-links-check.yml#137
Create dead-links-check.yml#137dcaldr wants to merge 3 commits intocenter-for-threat-informed-defense:developfrom
Conversation
CI that checks for dead links
|
Thank you for making a PR request!! 🤩 Links are a tricky thing, for sites we own (MITRE & CTID) it makes sense to check and a great call out. For sites we do not own...we will probably always come up with errors. Here is the reason, vendors (the main supplier of reports) can and do remove published reports 💔 . Annoying but since it's their report it's also their right. It's not uncommon for us to be using a report during development and suddenly find the report 💨 gone 😿 . Our work around ❤️🩹 has been to download reports earmarked as useful so we do not rely on the online version. This way if anyone has questions regarding citations, we can promptly provide the documentation even if the links are broken 🔗 . However GitHub is not the best place for document storage. So we don't upload those here. Any thoughts on other solutions? I haven't looked too deep in this project yet but it's now on my docket. If there is a way to ignore some links while verifying others, that would be helpful. This is also a good call out for a documentation update. Thank you! 🙏 |
ci: add: arguments to workflow and clean workflow test commits
add: accept code 403 as not error Signed-off-by: dcaldr <22105838+dcaldr@users.noreply.github.com>
|
I did some trial and error testing on the tool. The tool can also suggest for the dead links their saved version in wayback machine (Internet Archive)it is way slower but can get the job done. |
| uses: lycheeverse/lychee-action@v1.8.0 | ||
| with: | ||
| args: " --suggest --verbose --no-progress './**/*.md' './**/*.html' './**/*.rst' --exclude-mail -a 429 --exclude-path *fin7/Resources/Step7/BOOSTWRITE-src/curl/README.md " | ||
| args: " --suggest --verbose --no-progress './**/*.md' './**/*.html' './**/*.rst' --exclude-mail -a 403,429 --exclude-path *fin7/Resources/Step7/BOOSTWRITE-src/curl/README.md " |
There was a problem hiding this comment.
-- suggestadds wayback links--verbose --no-progressformat of output- *.md etc. targeting only selected files (not .c for example)
--include-verbatimcould add search inside md code blocks -atreats http codes 403 and 429 as good Bitdefender and about two others returns those (due to needed cookies and js) - maybe could be replaced with specific exclusion--exclude--exclude-paththis one file is in UTF-16 (?bug?) link checker chrashes on this (very rarely even now after excluding )
CI that checks for dead links as suggested by Issue #60 I used work from https://github.com/lycheeverse/lychee-action
has some false positives i.e. www.bitdefender.com as 403 error that I'm not able to fix. but most reported links are really broken. Further additions could be using cache or try to auto-solve links via internet archive as presented: more commandline arguments
I could put more time and effort, but as this is my first pull request I'm not sure if it's useful.